Chapter 4 Extra skills
In this section of my portfolio i will show you some of the extra skills I have developed during my data science minor.
4.1 Writing an introduction using Zotero for references
Writing research papers is a pretty important for the research field, so using references to other papers is crucial for writing a good introduction. below here you can see an introduction to my project about liquid biopsies, it makes use of multiple references.
Neuroblastoma is the fourth most common tumor in children and presents itself as either a low-risk neuroblastoma or a high-risk neuroblastoma. determining which risk neuroblastoma is present a tumor biopsy is necessary. (Weiser et al. 2019) however these biopsies are very invasive and can only be done a few times even tho the tumor keeps mutating. to counter this problem researchers have become more and more interested in liquid biopsies to be able to track the mutations in the tumor. this is possible because the tumor often excretes DNA into the blood stream which is then used for whole-exome sequencing (WES). these results can be compared with the DNA of the tumor to show how the tumor evolves over time (Chicard et al. 2018). because the tumor cells have different properties depending on what part of the tumor they are on it is difficult to show all the mutations based on 1 tumor biopsy as that can have a bias for only that specific part of the tumor where the biopsy was taken from. liquid biopsies also help with this problem as they sequence all the DNA that has been excreted by the tumor, this makes it possible to spot different mutations in both tumor DNA and cell free DNA (cf-DNA). (Van Paemel et al. 2022)
The type of mutations that get the most attention are the copy number variations/aberrations (CNV/CNA) these CNV’s can either be very small with just a few kilobases or very big where they cover the whole chromosome. the CNV’s are very importants as they can give an identification on how pathogenic the tumor is based on the genes it contains, the position of the CNV and the size of the CNV (Riggs et al. 2020).
Some hospitals sadly don’t have easy ways to analyse the data from all these patients because they do not have the experience with data analysis. this makes the process of analyzing the data a slow and tedious task. even tho it would be extremely beneficial for the hospitals without data scientists to have the programs available to analyse these results quickly (Valsesia et al. 2013), this takes away the time consuming task of having to analyze every file manually which gives them time to focus on more important things like treating the patient. sharing these results to other hospitals is equally as important because there is not nearly enough data available to say with certainty how dangerous certain tumors are and if the tumors have been fully removed. With all the data combined the research towards liquid biopsies can evolve quickly making diagnoses easier and more reliable
In this project we will analyse the tumor DNA and the cfDNA created by WES and will make it reproducible so that Princess maxima centre can easily analyse the data for all their patients.
to accomplish this we will mostly focus on:
– giving all the CNV’s different ID’s so they can be easily distinguished from each other.
– showing which cytogenetic band the CNV falls in to.
– making an interactive plot to look up genes easily.
– automatically filtering genes that indicate high risk neuroblastoma’s.
– making a high throughput version so that it will analyse multiple datasets at the same time without having to manually insert all the data sets
– getting these results quickly and easily so the researcher does not have to focus on how to analyze the data.
4.2 creating paramaters for different data inputs
to show my ability to use paramaters I will be using data from the ECDC. the data is available in this repository under “data/COVID_cases_31_05_2022”
# loading in data
cases <- read.csv("data/COVID_cases_31_05_2022.csv")
# filtering the params used
cases_filtered <- cases %>% dplyr::filter(countriesAndTerritories == params$country, year == params$year, month >= params$period_start, month <= params$period_end)
# telling R the dateRep column is a date
cases_filtered$dateRep <- as.Date(cases_filtered$dateRep, format = "%d/%m/%Y")
# making a graph for cases
cases_graph <- cases_filtered %>%
ggplot(aes(x = dateRep, y = cases)) +
geom_point(size = .5) +
geom_line() +
labs(title = paste("Covid related cases from month", params$period_start, "to", params$period_end, "in", params$year, "for", params$country),
x = "Month",
y = "Covid related cases") +
theme_classic()
ggplotly(cases_graph)# making a graph for deaths
deaths_graph <- cases_filtered %>%
ggplot(aes(x = dateRep, y = deaths)) +
geom_point(size = .5) +
geom_line() +
labs(title = paste("Covid related deaths from month", params$period_start, "to", params$period_end, "in", params$year, "for", params$country),
x = "Month",
y = "Covid related deaths") +
theme_classic()
ggplotly(deaths_graph)If you want to recreate these graphs with different params clone this repository and use put this command in the console (with your own params ofcourse): bookdown::render_book(params = list(country = "Netherlands", year = 2021, period_start = 5, period_end = 10))